AITopics | compression technique

Collaborating Authors

compression technique

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Accuracy is Not All You Need

Neural Information Processing SystemsMar-22-2026, 17:34:13 GMT

When Large Language Models (LLMs) are compressed using techniques such as quantization, the predominant way to demonstrate the validity of such techniques is by measuring the model's accuracy on various benchmarks. If the accuracies of the baseline model and the compressed model are close, it is assumed that there was negligible degradation in quality. However, even when the accuracy of baseline and compressed model are similar, we observe the phenomenon of flips, wherein answers change from correct to incorrect and vice versa in proportion. We conduct a detailed study of metrics across multiple compression techniques, models and datasets, demonstrating that the behavior of compressed models as visible to end-users is often significantly different from the baseline model, even when accuracy is similar. We further evaluate compressed models qualitatively and quantitatively using MT-Bench and show that compressed models exhibiting high flips are worse than baseline models in this free-form generative task. Thus, we argue that accuracy and perplexity are necessary but not sufficient for evaluating compressed models, since these metrics hide large underlying changes that have not been observed by previous work. Hence, compression techniques should also be evaluated using distance metrics. We propose two such distance metrics, KL-Divergence and flips, and show that they are well correlated.

artificial intelligence, large language model, natural language, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.59)

Add feedback

e0e956681b04ac126679e8c7dd706b2e-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 10:41:31 GMT

large language model, machine learning, natural language, (23 more...)

Neural Information Processing Systems

Country:

Asia > India > Karnataka > Bengaluru (0.04)
North America > United States > District of Columbia > Washington (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(5 more...)

Genre:

Research Report > Experimental Study (0.92)
Research Report > New Finding (0.67)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Leisure & Entertainment (0.93)
Education (0.67)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
(2 more...)

Add feedback

Focused Quantization for Sparse CNNs

Yiren Zhao, Xitong Gao, Daniel Bates, Robert Mullins, Cheng-Zhong Xu

Neural Information Processing SystemsFeb-12-2026, 06:37:57 GMT

Coupled with lossless encoding, we built a compression pipeline that provides CNNs with high compression ratios (CR), low computation cost and minimal loss in accuracy.

artificial intelligence, machine learning, quantization, (19 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.95)

Add feedback

Appendix

Neural Information Processing SystemsFeb-8-2026, 13:05:31 GMT

The computation or estimation of the smoothness matrixLi requires addiotional preprocessing.

artificial intelligence, machine learning, quant, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback

11715d433f6f8b9106baae0df023deb3-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 02:37:33 GMT

dataset, generalization, sequence, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Japan > Honshū > Tōhoku (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.92)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)

Add feedback

ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training

Neural Information Processing SystemsDec-24-2025, 08:57:56 GMT

Large-scale distributed training of Deep Neural Networks (DNNs) on state-of-the-art platforms are expected to be severely communication constrained. To overcome this limitation, numerous gradient compression techniques have been proposed and have demonstrated high compression ratios. However, most existing compression methods do not scale well to large scale distributed systems (due to gradient build-up) and / or lack evaluations in large datasets. To mitigate these issues, we propose a new compression technique, Scalable Sparsified Gradient Compression (ScaleComp), that (i) leverages similarity in the gradient distribution amongst learners to provide a commutative compressor and keep communication cost constant to worker number and (ii) includes low-pass filter in local gradient accumulations to mitigate the impacts of large batch size training and significantly improve scalability. Using theoretical analysis, we show that ScaleComp provides favorable convergence guarantees and is compatible with gradient all-reduce techniques. Furthermore, we experimentally demonstrate that ScaleComp has small overheads, directly reduces gradient traffic and provides high compression rates (70-150X) and excellent scalability (up to 64-80 learners and 10X larger batch sizes over normal training) across a wide range of applications (image, language, and speech) without significant accuracy loss.

communication-efficient, name change, scalable sparsified gradient compression, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.59)

Add feedback

Self-Supervised Generative Adversarial Compression

Neural Information Processing SystemsDec-24-2025, 02:22:55 GMT

Deep learning's success has led to larger and larger models to handle more and more complex tasks; trained models often contain millions of parameters. These large models are compute-and memory-intensive, which makes it a challenge to deploy them with latency, throughput, and storage constraints. Some model compression methods have been successfully applied to image classification and detection or language models, but there has been very little work compressing generative adversarial networks (GANs) performing complex tasks. In this paper, we show that a standard model compression technique, weight pruning and knowledge distillation, cannot be applied to GANs using existing methods. We then develop a self-supervised compression technique which uses the trained discriminator to supervise the training of a compressed generator. We show that this framework has compelling performance to high degrees of sparsity, can be easily applied to new tasks and models, and enables meaningful comparisons between different compression granularities.

electronic proceedings, name change, self-supervised generative adversarial compression, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.61)

Add feedback

A Systematic Study of Compression Ordering for Large Language Models

Chhawri, Shivansh, Mahadik, Rahul, Rooj, Suparna

arXiv.org Artificial IntelligenceNov-26-2025

Large Language Models (LLMs) require substantial computational resources, making model compression essential for efficient deployment in constrained environments. Among the dominant compression techniques: knowledge distillation, structured pruning, and low-bit quantization, their individual effects are well studied, but their interactions and optimal sequencing remain unclear. This work systematically examines how these techniques perform both independently and in combination when applied to the Qwen2.5 3B model. We evaluate multiple compression pipelines, including single, and proposed three-technique sequences, using perplexity, G-Eval, clarity, prompt alignment, and compression ratio as metrics. Our experiments show that quantization provides the greatest standalone compression, while pruning introduces moderate quality degradation. Critically, the ordering of techniques significantly affects the final model quality: the sequence Pruning, Knowledge Distillation, Quantization (P-KD-Q) yields the best balance, achieving a 3.68x compression ratio while preserving strong instruction-following and language understanding capabilities. Conversely, pipelines applying quantization early suffer severe performance degradation due to irreversible information loss that impairs subsequent training. Overall, this study offers practical insight into designing effective, ordering-aware compression pipelines for deploying LLMs in resource-limited settings.

large language model, machine learning, quantization, (17 more...)

arXiv.org Artificial Intelligence

2511.19495

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

COLI: A Hierarchical Efficient Compressor for Large Images

Wang, Haoran, Pei, Hanyu, Lyu, Yang, Zhang, Kai, Li, Li, Fan, Feng-Lei

arXiv.org Artificial IntelligenceNov-25-2025

The escalating adoption of high-resolution, large-field-of-view imagery amplifies the need for efficient compression methodologies. Conventional techniques frequently fail to preserve critical image details, while data-driven approaches exhibit limited generalizability. Implicit Neural Representations (INRs) present a promising alternative by learning continuous mappings from spatial coordinates to pixel intensities for individual images, thereby storing network weights rather than raw pixels and avoiding the generalization problem. However, INR-based compression of large images faces challenges including slow compression speed and suboptimal compression ratios. To address these limitations, we introduce COLI (Compressor for Large Images), a novel framework leveraging Neural Representations for Videos (NeRV). First, recognizing that INR-based compression constitutes a training process, we accelerate its convergence through a pretraining-finetuning paradigm, mixed-precision training, and reformulation of the sequential loss into a parallelizable objective. Second, capitalizing on INRs' transformation of image storage constraints into weight storage, we implement Hyper-Compression, a novel post-training technique to substantially enhance compression ratios while maintaining minimal output distortion. Evaluations across two medical imaging datasets demonstrate that COLI consistently achieves competitive or superior PSNR and SSIM metrics at significantly reduced bits per pixel (bpp), while accelerating NeRV training by up to 4 times.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2507.11443

Country: Asia > China (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Scaling Up Efficient Small Language Models Serving and Deployment for Semantic Job Search

Behdin, Kayhan, Song, Qingquan, Vasudevan, Sriram, Sheng, Jian, Ma, Xiaojing, Zhou, Z, Zhu, Chuanrui, Li, Guoyao, Nguyen, Chanh, Ghosh, Sayan, Sang, Hejian, Baarzi, Ata Fatahi, Ramachandran, Sundara Raman, Wang, Xiaoqing, Lan, Qing, S, Vinay Y, Guo, Qi, Johnson, Caleb, Wang, Zhipeng, Borisyuk, Fedor

arXiv.org Artificial IntelligenceOct-28-2025

Large Language Models (LLMs) have demonstrated impressive quality when applied to predictive tasks such as relevance ranking and semantic search. However, deployment of such LLMs remains prohibitively expensive for industry applications with strict latency and throughput requirements. In this work, we present lessons and efficiency insights from developing a purely text-based decoder-only Small Language Model (SLM) for a semantic search application at LinkedIn. Particularly, we discuss model compression techniques such as pruning that allow us to reduce the model size by up to $40\%$ while maintaining the accuracy. Additionally, we present context compression techniques that allow us to reduce the input context length by up to $10$x with minimal loss of accuracy. Finally, we present practical lessons from optimizing the serving infrastructure for deploying such a system on GPUs at scale, serving millions of requests per second. Taken together, this allows us to increase our system's throughput by $10$x in a real-world deployment, while meeting our quality bar.

arxiv preprint arxiv, large language model, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2510.22101

Genre: Research Report (1.00)

Industry: Information Technology > Services (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback